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Abstract — An important receiver operation is to detect the 
presence of specific preamble signals with unknown delays and 
in the presence of scattering, Doppler effects and carrier offsets. 
This task, referred to as "link acquisition", is typically a sequen- 
tial search over the transmitted signal space. Recently, many 
authors have suggested applying sparse recovery algorithms in 
the context of similar estimation or detection problems. These 
works typically focus on the benefits of sparse recovery, but not 
generally on the cost reduction brought by compressive sensing. 
Thus, our goal is to examine the trade-off in complexity and 
performance that is possible when using sparse recovery. To 
do so, we consider a sequential Compressive Sparsity-Aware 
(C-SA) acquisition scheme, where a Compressive Multi-channel 
Sampling (CMS) module is followed by a Sparsity Regularized 
(SR) Likelihood Ratio Test (LRT) receiver. 

Our C-SA acquisition scheme borrows insights from the 
models studied in the context of sub-Nyquist sampling, where 
a minimal amount of samples is used to reconstruct signals with 
Finite Rate of Innovation (FRI). In particular, we propose an 
A/D conversion front-end that maximizes the Kullback-Leibler 
distances among the hypotheses of the SR-LRT performed on 
the samples. We compare the proposed acquisition scheme vis- 
a-vis conventional alternatives with relatively low computational 
cost, such as the Matched Filter (MF), in terms of performance 
and complexity. Our experiments suggest that one can use the 
proposed C-SA scheme to scale down the receiver implementation 
cost, with greater flexibility than conventional MF architectures. 
However, we find that they both have overall complexities that 
scale linearly with the search space and that compressive mea- 
surements used in the SR-LRT at low SNR lead to a performance 
loss, as one could expect given that they use less observations. At 
high SNR, on the other hand, the SR-LRT has better performance 
in spite of the compression. 

Index Terms — Multiuser communications, compressed sensing, 
sparse recovery, detection and estimation. 

I. Introduction 

One of the critical receiver tasks in a multiuser scenario, 
referred to as link acquisition, is that of detecting the presence 
of signals, and identifying the link parameters (e.g., delays, 
Doppler) of an unknown subset I of active sources out of 
/ possible ones. Similar to (TJ, (2), we consider the case in 
which the set of active users 1 transmit known and distinct 
training preambles (j>i(t), i € X. Usually these preambles 
are fairly long, so that their energy at the receiver can rise 
above the noise. In the initial phase the receiver is completely 
agnostic about sources surrounding it and it tests the sensed 
signal x(t) until it detects the presence of such signals, to 
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establish the active links. This needs to be done accumulating 
observations and repeating the test sequentially. The acquired 
link information is essential for identifying the basic features 
of the active signal spaces, so that the receiver can determine 
if it can decode the data after the training phase (3), Q 
and refine the link parameter estimates using mid-ambles and 
decoded data. The term "link acquisition" is equivalent to 
resolving the received signal space, which is characterized by 
the propagation delays and the Doppler frequencies. 

A. Related Works on Link Acquisition of Multiuser Signals 

We can classify the algorithms that are used for link 
acquisition into two main groups. The first category acquires 
a sufficient statistic by directly sampling x(t) at (or above) the 
Nyquist rate. The likelihood function is then exploited to detect 
the presence of signals and determine the link parameters in 
the model given the set of active users I. We refer to such 
techniques as Direct Sampling (DS) methods (e.g. [5|-|8|). 

A second approach, referred to as the Matched Filtering 
(MF) (4), (9), (10), facilitates the search of both the active 
set I and link parameters by comparing the filtered outputs 
of the signal x(t) from a bank of filters tpi(t — T)e~ luJt , each 
matching a sufficiently wide collection of points in the full 
parameter set T X T where r € T and u£ J are the delay 
and Doppler spread respectively. MF is a prevalent choice 
in hardware implementations because of its simplicity. The 
MF approach can be implemented in two ways: in the digital 
domain as a DS method, where samples are projected onto the 
sampled version of <f>i(t — r)e~ lwt , or directly in hardware, as 
a multi-channel sampling structure, perfoming the projections 
onto the templates (f>i{t — r)e~ lut via analog filters. Specific 
details on these algorithm architectures are provided in Section 

nm 

Classical algorithms take little advantage of the low di- 
mensionality of the received signal space in storing the 
observations that are processed or to improve the detection 
performance. Recently, there have been advances in exploiting 
sparsity, or the low dimensionality of the signal space, to 
improve receiver performance. One class of papers suggests 
using sparse signal recovery for the purpose of either detection 
or estimation. For instance, assuming that the signal is present, 
JTJ, |(2), Q, (TTJ deal with identification of the active users 
and/or estimation of signal parameters by creating a dictionary 
from the known templates 4>i{t) and viewing the signal x{t) 
as a sparse linear combination of these element templates 
inside the dictionary. Without knowledge of signal presence 
within a specific observation window, the proposed detection 
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schemes in p2|-fl6) use generic compressed measurements 
to detect the presence of certain signals, starting from an 
abstract discrete model. We call this class of methods Sparsity- 
Aware (SA). In these papers, Doppler shifts and delays are not 
explicitly considered and the discrete observations are treated 
independently as a single snapshot, upon which SA algorithms 
are applied. 

B. Multiuser Signals with Finite Rate of Innovation (FRI) 

What is often neglected in existing SA approaches is the 
acquisition of informative low rate discrete samples from the 
analog domain. As we mentioned, preamble sequences are 
usually fairly long and the receiver needs to sample the signal 
x(t) at a fast rate and store them prior to processing. This can 
become a bottleneck in designing preambles so that they have 
the appropriate energy. 

Reducing the sampling rate and the associated storage 
incurred at the A/D front-end is mostly the concern of another 
broad class of papers [ 17 1-|21 ] on signals with a Finite Rate of 
Innovation (FRI) |22) . In general, an FRI model has a sparse 
parametric representation. Given the preamble fa (t) for each 
active user i el traveling through R multipath channels, the 
class of signals x(t) lies in a subspace with no more than 
\I\R dimensions, where each dimension has three unknowns 
(e.g., delay, Doppler, channel coefficient), irrespective of its 
bandwidth and duration. 

The premier objective of FRI sampling architectures is 
A/D conversion at sub-Nyquist rates (i.e. deterministic signal 
reconstruction). This objective is fundamentally different from 
what is of practical interest in link acquisition, which is 
to perform statistical inference. In this paper, we wish to 
harness similar complexity gains as in the FRI literature, while 
mitigating the detection performance losses that arise in the 
presence of noise due to the reduced number of observations. 
To this aim, we formulate the link acquisition problem as a 
Sparsity Regularized (SR) Likelihood Ratio Test (LRT) that 
uses compressive samplers maximizing the average Kullback- 
Leibler (KL) distance among the hypotheses in the test. We 
refer to the compressive samplers designed in this paper as 
the Compressive Multichannel Sampling (CMS) architecture 
and the proposed link acquisition scheme as the Compressive 
Sparsity-Aware (C-SA) scheme. More specifically, we discuss 
in this paper 

1) a unified low -rate CMS architecture for the C-SA acqui- 
sition scheme using the proposed SR-LRT; 

2) the sequential SR-LRT that jointly detects signal pres- 
ence and recovers the active users with their parameters; 

3) the compressive samplers of the C-SA scheme that 
provide the maximum average Kullback-Leibler (KL) 
distance of the SR-LRT; 

4) the comparison of the proposed architecture with the 
MF approach in terms of performance, storage cost and 
computational complexity. 

This bridges the results pertaining to sparsity-aware estima- 
tion/detection (TJ, (2), (4), (12)-(T6), the literature on analog 
compressed sensing and sub-Nyquist sampling J4|, (8), [10|, 
(17), (18), (23) and FRI sampling (19), (20), (22) such that 



sampling and estimation/detection operations are considered 
jointly. 

To measure the benefits of the proposed C-SA scheme over 
other schemes, we analyze the practical trade-off between the 
implementation costs spent in acquiring samples and those 
invested in sparse recovery. This is important to clarify the 
potential benefits of sub-Nyquist architectures in communi- 
cation receivers in the link acquisition phase. These schemes 
often benefit from the denoising capabilities of SA algorithms 
(as well documented in (T), (2), (4), (12)-(16)) but must 
loose sensitivity due to the fact that they do not use sufficient 
statistics for the receiver inference. 

The question we consider is, therefore, what is there to 
gain: hardware, storage, complexity or performance? Our 
numerical experiments indicate that the main advantage of 
the proposed scheme is that it enables the designer to find 
an adequate operating point for link acquisition such that 
processing requirements and complexity of the receiver can be 
reduced to an acceptable level without significantly sacrificing 
acquisition performance compared with the MF architecture. 
We also confirm numerically that the compressive samplers we 
propose in the CMS architecture harvest highly informative 
samples for the SR-LRT in terms of estimation and detection 
performance. 



C. Notation and Paper Organization 

We denote vectors and matrices by boldface lower-case and 
boldface upper-case symbols and the set of real (complex) 
numbers by K (C). We denote sets by calligraphic symbols, 
where the intersection and the union of two sets A and B 
are written as A p| B and A 1J B respectively. The opera- 
tor \A\ on a discrete (continuous) set takes the cardinality 
(measure) of the set. The magnitude of a complex number 
x is denoted by \x\ — v 'xx* , where x* is the conjugate of 
the complex number x. The transpose, conjugate transpose, 
and inverse of a matrix X are denoted by X T , X H and 
X -1 , respectively. The inner products between two vectors 



x, y £ C Wxl and between two continuous functions f(t), g(t) 
in L 2 (C) are defined accordingly as (x, y) = J2n=i Vn x n 
and (f(t),g(t)) = g*(t)f(t)dt. The W-weighted l 2 - 
norm of a vector x is denoted by ||x|| w = V x-^Wx, and 
the conventional ^-norm is written as ||x||. The L2-norm 
of a continuous-time signal f(t) £ L 2 (C) is computed as 

The paper is organized as follows. Section [II] introduces 
our observation model. We discuss related works on link 
acquisition in Section [III] The CMS architecture for com- 
pressive acquisition is considered in Section [IV] Using the 
compressive samples obtained from the CMS architecture, 
we develop the SR-LRT algorithm for C-SA link acquisition 
scheme in Section [V] We then optimize the compressive 
samplers in the CMS in Section VI Simulations demonstrating 
the performance are presented in Section |VII| The overall 
cost of the proposed C-SA scheme using the CMS module 
is compared against conventional MF schemes in terms of 



computational complexity and storage in Section VIII 
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II. Sequential Link Acquisition 



In every communication standard, a key control sequence in 
the training phase is the initial preamble. The receiver models 
the corresponding observation by assuming that each i G I 
from the unknown active set transmits a specific preamble 
4>i(t). This transmission is followed by the mid-ambles 
and data frames. A common choice for such a preamble 
in multiuser communications is a linearly pulse modulated 
sequence with a chip rate 1 /T close to the signal bandwidth 
and equal to the minimum Nyquist-rate 

M-l 

Mt)= J2 Oi[m]g{t-mT). (1) 

Here g(t) is the pulse shaping filter (chip) and {aj[ m ]}m=i I s 
typically a long preamble sequence M 3> 1 for each user. 

Then the observation at the receiver can be written as 

R 

X W = 1212 h i,r<t>i(t - ti : r)e iUi ' rt + v(t), (2) 

where r is the unknown propagation delay of the ith user 
in the rth multipath, |a>j, r | < w max is the Doppler frequency 
upper bounded by the maximum Doppler spread ui max , and 
hi r is the unknown channel fade. Without loss of generality, 
we assume that the maximum multipath order R is known 
and the noise component v(t) is a white Gaussian process 
with E{v(t)v* (s)} — a 2 S(t - s). Our problem is to detect the 
presence of the active user set 1 and the corresponding link 
parameters {hi <r , ti ir , w,- )r } for i £ 1 and r = 1, • • • , R. 

Since the propagation delays t i r are unknown and possibly 
large (infinite if there is no signal transmitted at all), the 
typical A/D front-end for link acquisition is sequential. The 
acquisition scheme produces test statistics every D units of 
time, where D is the shift in the time reference for detections. 
At every shift t = nD, the receiver decides whether the signal 
x(t) is present at or after t = nD. For convenience, we denote 
to = min ti :T as the delay of the first arrival path among all 

i,r ' 

users. Let 

t = IV-DJ (3) 

be the value of n = £ that most likely produces a positive 
detection and 

T-i r ti,r CD (4) 

be the composite delay. Clearly, we have < T.- L , r < r max , 
where r max is the composite delay spread. Note that the delay 
Ti r can f a ll anywhere within [0, D) and thus r max > D. This 
allows us to express |2]) equivalently and uniquely as 

R 

x(t) =J212 h i,r<Pi(t n,r)e^- rt + v(t). (5) 

iel r=l 

After these considerations, it is clear that the search spaces of 
delays and Doppler frequencies for each shift n are respec- 
tively T = [0, r max ] and T = [-w max , w max ]. 



A. Goal of Link Acquisition 

Note that there could be multiple values of n ^ I that lead 
to valid positive detections, where for a given I, the relative 
composite delay with respect to the nth shift would be 

rff = n, r - (n - t)D. (6) 

In order to single out the best reference shift, the receiver 
will have to compare a sequence of Nq test statistics after the 
first positive detection at n = N„, and choose the particular 
shift l± that maximizes the likelihood ratio between the signal 
hypothesis and the noise hypothesis. We call the maximum 
likelihood ratio (MLR) shift. The look-ahead horizon Ao 
can be chosen considering the type of sampling kernels, 
the preambles 0j(f)'s, and the delay spread r max , making 
reasonable approximations about the durations of the signals. 

Definition 1. Link acquisition refers to 

1) locating the MLR shift £+; 

2) identifying the set of active users I in the i+th shift; 

3) resolving the delay-Doppler pairs r ,uj ir } for j 6l 
and r = 1, • • • , R. 

Usually, the preamble signals (fo(t)'s have large energy, so 
that they can rise above the receiver noise. Given that the 
average power is constant, the <f>i (t) typically last much longer 
(i.e., M is large) than subsequent mid-ambles or spreading 
codes that modulate data. For a typical wireless application 
such as GPS or IS-95/IMT-2000, transmitters continuously 
send out preamble sequences with length on the order of 
M = 20 x 1023 (24) or M = 32768 (25), respectively. This 
means that in order to detect the presence of such pream- 
bles and acquire the synchronization parameters, architectures 
using DS or MF approaches would have to store a large 
amount of data to process in a sequential manner. This phase is 
crucial to properly initialize any channel tracking that ensues. 
In Section [Til] we provide details on the A/D architectures 
and the corresponding post-processing for conventional link 
acquisition schemes. We then present the proposed CMS 
architecture and the C-SA link acquisition scheme in Section 
El 

III. Existing Architectures for Link Acquisition 

For future use, we let the Nyquist rate of the signal x(t) 
be / NYQ = 2W + w max /7r with W being the maximum single- 
sided bandwidth of 4>i(t), i = 1, • • • , /. 

A. Direct Sampling (DS) 

In DS schemes, the received analog signal x(t) is sampled 
by projecting it onto an ideal series of Dirac's deltas, every 
/ - = l/fs < 1//nv Q , i.e. 

CdsN = (x(t), S(t - wT s )) = x (wT s ) . (7) 

At the nth shift, DS schemes use the most recent W Nyquist 
samples for every shift D = NT S 

c DS [n] = [c DS [nN], ■■■ , c DS [nN + (W - l)]f (8) 
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to perform the detection. Based on @, the samples c DS [n] can min J arg > 770 I, then the MLR shift 4 is given by 

be expressed as I ™ J 

4 = argmax r} DS (n), n = N v , ■■■ ,N V + N . (15) 



where J C {1 
shift, and 



CdsN =$ i7 (r <7 ,a; <7 )h >7 +v[n] ) (9) 
, /} is the set of active users at the nth 
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represent the \S\R residual delays, Doppler and channel 
coefficients corresponding to the set of users J in that shift. 
The vector v[n] contains the noise samples [v[n]] = v(nD + 
wT e ), and &j (tj,uj) is a W x |JT| J2 sub-matrix of the 
complete W x /i? matrix <I> (t,u>), from which we extract 
columns j 6 J\ The full matrix (t,u>) is defined by 

* (t, u>) I 4 ^ ( W T S - L g T - T iir ) e --- T = , 

- w,{i— l)-R+r 

where L 9 is the number of truncated side-lobes of g(t) on one 
side. 



Using the nth shift of observations at t = nD, link acquisi- 
tion amounts to performing the following multiple composite 
hypothesis test 

Uj : c DS [n] = *j (Tj,uj)hj + v[n], 
U : c DS [n] = v[n], 

with unknown parameters J, tj, ujj and hj. 



The Generalized Likelihood Ratio Test (GLRT) is then 
typically used. This test requires solving the non-linear least 
squares estimation (NLLSE) 



IS 



1'%)%}= argmin ||c DS [n] - #7 (tj, Uj) hj\ 

over all possible J, (tj,U)j) € T |J| x J"!- 7 ', hj e C |J| to 
compute the generalized likelihood ratio 



'(n ) 



(13) 



with estimates jl, tj, <2j, hj| obtained at every shift t = 
nD. The expression of the generalized likelihood ratio is given 
in 1 26 1 for cases when the noise variance a 2 of v[n] is known 



and unknown. Using the corresponding ratio as test statistics, 
the receiver checks if the test statistic satisfies rj DS (n) > rjo 
for some properly chosen threshold 770 > 1. Without loss of 
generality, we consider the most general case where a 1 is 
unknown. In this case, the generalized likelihood ratio has the 
following expression 

<Un)-m- IM "" r : „ ■ (H) 



Denote the first shift that passes the GLRT as N v 



Obviously, the test described above is intractable in general, 
since there are 2 1 hypotheses at each shift t = nD to explore, 
and for each of them, there is an NLLSE problem to solve. 
Therefore in practice, DS acquisition schemes either deal 
with the known user case J = 1 or assume the full set 
J = {1, ■•• ,J} during detection, followed by NLLSE for 
that specific user set. 

When the set of active users I is unknown, alternatives are 
Matched Filtering (MF) and Sparsity -Aware (SA) approaches. 
The C-SA scheme in this paper is an instance of the SA 
technique, which performs sequential detection and estimation 
using sub-Nyquist samples from the proposed CMS architec- 
ture. We next describe the MF approach and then the SA 
method. 



B. Matched Filtering (MF) 

The MF receiver is a widely used architecture in practice 
because of its ease in finding the active parameters by simply 
observing the filtered outputs of the MF filterbank, which is 
constructed from the MF templates (f>i(t)e lkAult for some i 
and k. Clearly, the size of the filterbank has to be finite. 
Therefore, it is usually assumed that Tj, r rs <7i,rAr and 
Uir fa ki >r Auj for some integers qi r and k^ r with a certain 
resolution Ar = r max /Q and Alu = LJ max /K. The search 
spaces for the MF receiver then become Q = {0, 1, • • • , Q— 1} 
and JC = {—K, • • • , K}, which is the discrete counterpart of 
the continuous search space T x T. 

The MF receiver is a popular choice for multiuser acqui- 
sition J3J, for example, in GPS receivers p7| or CDMA 
receivers [25 1. Its comparison with the C-SA acquisition 
^ scheme using the CMS architecture we propose in this paper 
is particularly insightful because, although the MF front-end 
requires a large filterbank, the post-processing of its outputs 
is very simple. One could certainly perform more complex 
post-processing to enhance its performance. For example, the 
Orthogonal Matching Pursuit (OMP) algorithm in our C-SA 
scheme can be applied on the MF outputs for this purpose. 
However, in that case, as illustrated in Section |VIII[ the 
resulting scheme will have much higher storage cost and 
computational complexity requirements. More importantly, the 
OMP technique can be directly applied to the Nyquist samples, 
as done in SA methods, making the MF stage superfluou^] 

The MF obtains the decision statistics by passing x(t) 
through a bank of P = 7|/C| MF templates, and sampling 
the outputs every At. To be consistent with the sequential 
structure in (BJ, the MF shifts its templates every D = NT S , 
and samples the outputs every Ar = T s < 1// NYQ . The 
output of the pth MF with p = (k + K)I + i for k e K 



'Strictly speaking, the OMP technique performs a MF stage in its first 
iteration. The subsequent iterations can be viewed as applying successive 
cancellation. 
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and i = 1, • • • , I is obtained as 

Ci , fc M = IxitlUt-wT^^-^). (16! 



Oftentimes, the filtering process is implemented in the digital 
domain using the samples c DS [n] in (8). For consistency, we 
proceed with the description in the analog domain. At the nth 
shift, the samples used for detections can be stacked into an 
I\K] x \Q\ exhaustive MF output array 



C MF [n] 



c ltk [nN] ••• c ltk [nN + Q-l] 



ci )k [nN] 



c I:k [nN + Q -1} 



(17) 



Then the MF receiver uses C MF [n] as test statistics and 
performs the test on each user as follows 



\(k,k [nN - 



q} \ >Pt 



l,---,/, fce/C, qe Q, 



where pi is the chosen detection threshold for each user. 
Denote the active user set at the nth shift as 

1 = {i : \ci. k [nN + q}\> Pi, Vi, k,q} 

and the shift N n = min |arg \cj tk [nN + q]\ > Pi, Vi, k, q^. 
The MLR shift is then obtained as 



l± = argmax 



max \ci tk [nN + q]\ 

i,k,q 



N ri 



,N V + N . 



Given the multipath order R, the delay-Doppler pairs at the 
nth shift are found as the R strongest outputs c^k[i*N + q]'s 
for all detected users i E I over the MF search space k e JC 
and q e Q. For convenience, we denote the strongest path by 
{ki, iiQi,i) for the ith user i e 1 and order the samples by 
magnitudes 

\c iM:1 [i*N + q itl ] | > \c lM2 %N + q z , 2 ] | > ••• (18) 
> [c i}kitR [t*N + q iiR } [>■■■ 

for each user i E 1. The delay-Doppler pairs are identified as 
the set 

M = {(*i,r,?i,r) :r=l,--- ,R}, (19) 
which give the following link parameters 

f. M . = <7i :r Ar, Lj.^ r = k hr Aui, {h,r,%r) G ^i- 

Although the MF approach shows an advantage in its post- 
processing and implementation, it has a few drawbacks: 

i) the size of the MF filterbank scales with the number of 
users / and the parameter set |/C|; 

ii) digital implementation requires high rate processing, in- 
creasing storage and pipelining cost; 

iii) the MF samples Cj fc[-] contain the interference from 
different users and multipath components; 

Pipelining refers to timely processing of the samples that stream into the 
system per unit of time. 



During the link acquisition phase, the effect of interference 
(iii) is mitigated by using wideband pulses g(t). During the 
data detection phase, multipath and multiuser interferences are 
dealt with using a RAKE type receiver and the interference 
is tackled either by using linear multiuser receivers or, in 
some cases, using successive interference cancelation (SIC) or 
even maximum likelihood multiuser detection [3|. Typically, 
the complexities of these schemes for data detection grow 
rapidly with respect to the size of the MF filterbank (i) and 
the sampling rate (ii). Since this phase is conducted after 
the link acquisition, the uncertainties about the set of active 
users, their delays and Doppler frequencies have already been 
resolved and therefore, these tasks become more manageable. 

C. Sparsity Aware (SA) Approach 

Instead of simply observing and ranking the MF outputs, 
many recent works have proposed the idea of compressed 
sensing or sparse recovery to solve estimation and detection 
problems. For the purpose of user identification and parameter 
estimation, one approach is to approximate (|9]l by a sparse 
model with a dictionary constructed from the ensemble of 
possible templates 4>i(t) JT], Q and/or discretized delays r 
[2] (similar to the MF templates), where the joint recovery of 
active users and unknown parameters is relaxed as a sparse 
estimation problem. These sparse methods, which we call the 
Direct Sparsity-Aware (D-SA) scheme, usually assume the 
MLR shift to be known and require Nyquist rate samples. 
On the other hand, aiming at signal presence detection rather 
than identifying the active users and recovering the parameters, 
1 12|-|[T6[ reduce the number of samples required for the test 



by using a linear compressor on the block of given discrete 
observations, which we call the Compressive Sparsity-Aware 
(C-SA) scheme. In terms of the acquisition front-end, the 
D-SA scheme is a special case of C-SA scheme with a 
compressor that is an identity matrix. 

In this paper, the proposed CMS architecture with SR-LRT 
bears resemblances of certain features of the C-SA scheme 
because of the compressive samples obtained in the front- 
end. Thus, to avoid any confusion, we also refer to the SR- 
LRT based on the CMS architecture in this paper as the C- 
SA scheme. However, a distinctive difference between the C- 
SA scheme in this paper and those in fT2")-fl6) is that the 
C-SA scheme proposed in this paper unifies the sequential 
signal detection, identification of active users and estimation 
of parameters by using the compressive samples obtained from 
a flexible multi-rate A/D architecture, while the C-SA scheme 
in |[T2|-|[T6| directly starts from an abstract discrete model that 
is already sampled. Last but not least, the sampling kernels 
in the proposed CMS architecture are further optimized with 
respect to the estimation and detection performance. 

IV. Compressive Sequential Link Acquisition 

A. Compressive Multichannel Sampling ( CMS) 

We propose to use the A/D front-end in Fig. [Tj typical in 
FRI sampling (17), (19), (20), (22), (23) for this work, which 
samples the signal every t = nD by a P-channel filterbank 



c p [n} = (x(t),ip p (t-nD)} , p = 1, 



(20) 
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x(t) 



— ► 

t = nD 



ifj % 



(-*) ► Cp[n] 



t = nD 



► 

t = nD 



Fig. 1. Samples obtained from compressive acquisition. 



We call this architecture the Compressive Multichannel Sam- 
pling (CMS) scheme. 

Note that ( pO) can also be implemented in the digital domain 
by performing linear projections of the discrete signal c DS [n] 
in d9). This means that the CMS architecture becomes part 
of the post-processing of the Nyquist samples of x(t), which 
lowers the storage and computation requirements as illustrated 
in Section IVIIII Similar derivations can be done in discrete 
time, but the advantage of using the analog description is that 
we do not necessarily have to target bandlimited signals. In the 
FRI literature (T7)-(20), (22), (23), in the absence of noise, 
the sampling rate required for the unique reconstruction of 
the signal in |5) is the number of degrees of freedom of the 
signal x(t) per shift D, equal to the number of unknowns 

{l~i,ri ^i,r: ^i,r} i£Z,r=l,~ ,R- This amounts to P m in = 3|I|i?, 

which can be much less than what is needed in the MF 
approach P m j„ <C -Pmf — when the number of active 
users is not large \X\ <C I, or the number of multipaths 
R is much less than the dimension of the search space for 
Doppler \1C\. However, since the estimation and detection are 
performed in the presence of noise, the number of P needs 
to be increased in general to enhance the sensitivity of the 
receiver. This gives the option of trading off accuracy with 
storage cost and computational complexity, by adjusting the 
number of samples P to process between P m i n < P < Pmf- 
Note that for different schemes, we need further processing 
to produce final decisions. This last step is different for 
different receivers. For example, the MF scheme has very 
simple post-processing at the cost of handling exhaustive MF 
samples, while the C-SA scheme can tune the number of mea- 
surements to handle less data by spending a higher premium 
for sparsity recovery algorithms. Therefore, we discuss this 



trade-off specifically in detail in Section VIII with the MF 
being a benchmark for our comparison. 

B. CMS Observation Model 

Similar to (TJ, (2), we follow the analog description of 
Q but discretize the parameters as in the MF approach. 
The analog domain derivation is mostly inspired by the FRI 
literature, but it is not tied to wanting to estimate continuous 



parameters here in this paper. For notational convenience, we 
introduce the triple-index coefficient 

B 

Ui,k,q = ^ h j>r 5[i - j]8[k - k jir ]6[q - q jir ], (21) 
jel r=l 

for k € fC, q G Q as an indicator of whether the ith user is 
transmitting and whether there exists a link at a certain delay 
t = qAr with a certain carrier offset qj — kAu in the window. 
Note that a.i_k, q = except when k = ki_ r and q = qi^ r for 
i € I. Denoting each MF template by 

^, k , q (t)^Mt-qMe ikAult , (22) 

the signal in |5]) can be approximately expressed as 



x (t) = a< >*v.'' 

i=l keKqeQ 



(23) 



Clearly, x(t) has at most \I\R active components due to the 
sparsity of cm,k,q- To facilitate notations in our derivations, 
we introduce the triplet index (i, k,q) and define the length- 
I|/C||Q| link vector a[£] at the M shift as 

= onh tq . (24) 

(i-l)|X;||Q|+(fe-l)|Q|+9 

We define the associated delay-Doppler set for the ith user 
at the £th shift as 



a [£] 




a[i] 




(i,fc,g) 





A [ P ={(k,q) : \a iiktq \ >0,fce/C,ge Q} . 



1. 



• J, 
(25) 



from which we extract the time delays T,; r = qi, r Ar and 
Doppler frequencies u>i, r = fc^Au for the active users i el 
if Af ] £ 0. 

In this work, we consider the use of sampling kernels ip p (t) 
that are linear combinations of all the MF templates [23]. The 
following theorem specifies the observed samples from the 
CMS architecture in Fig. [T] in relation to the link vector a[£]. 

Theorem 1. Suppose that we choose sampling kernels 
{V'pWjp^l CLS linear combinations of the MF templates 

I 

VVW = Z2 Z2 Y b P,(l,k,q)&,k,q(t), P =!,■■■ ,P. 

(26) 

The length-P sample vector c[n] = [ci[n], • • • ,cp[n]] T taken 
at shift t = nD, can then be expressed as 



c[n] = BM w [n - £]r[£]a[f] + v[n], 
where a[£] is the link vector at the Ith shift and 



(27) 



1) B is a P x ij/C||(2| matrix with [B] p ^ k ^ = b p ^ %k ^ q) ; 
M00[n — £] is an I\K.\\Q\ x /|/C||Q| matrix with 



M^[n-e] 
where 



(i' ,k' ,q' ) ,{i ,k ,q) 



R,, k ,^. JM] = e ^(At-q'Ar) R (k 7 k') [{q , ^ + At] ( 
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and is the ambiguity function 



R 



(k-k'), s _ 



(•)= / ^(*)&(*-> t(fc -*' )Aw Mt. (28) 



2) = I z ® E[£] ® I| Q | tvif/i 



E[£] 4 diag[ 



, e 



-iKAueD-i 



(29) 



3) = [^i[n],--- , i/p[n]] T is the filtered Gaussian 

noise vector with zero mean and covariance 

R vv = a 2 R^ with R^> = BM 00 [O]B ff . (30) 

Proof: See Appendix |A| ■ 
The freedom in choosing B allows us to optimize link 
acquisition performance. Before discussing the details of op- 



timization in Section VI we further simplify the model in 
Theorem Q] 

C. CMS Sequential Acquisition Model 

Theorem [T] describes the general model of the samples c[n] 
obtained in the nth shift with respect to the link vector oc[£]. 
However, the exact shift I is unknown to the receiver. As 
mentioned earlier, determining the exact shift is not necessary 
to recover the link parameters, as long as the shift is properly 
aligned with the signal and produces a positive detection 
maximizing the likelihood ratio. In the following, we transform 
the observation model c[n] in Theorem [T] to an equivalent 
model. The equivalent model is stated with respect to a 
modified link vector a [n] at the nth shift, which contains 
entries that are shifted with the relative placement of (n — £) 
in relation to ot[£]. The reason for this is that we can use 
a time-invariant system matrix instead of a time-variant one 
[n — £} for the purpose of sequential detection. 

Theorem 2. Let D = NAt for some integer N G Z. The 
outputs c[n] of the compressive samplers can be re-written as 



c[n] = BMr[n]a[n] +v[n], 



(31) 



where M = M^fO], and a[n] is the link vector at the nth 
shift 



a[n] 



a i,k,q+(n-i)N ■ 



(32) 



(i,k,q) 

Proof: See Appendix [5] I 
Corollary 1. Let the delay-Doppler sets at the nth shift be 



A 



(n) A. 



a[n] 



(i,k,q) 



^0,keIC,qeQ} (33) 



for i — 1, • • • , /. Then for any (k,q) G A\ at the £th shift 
i, we have {k, q — (n — £)N) G A\ at the nth shift. 

Using the modified sets , the number of delay- 



Doppler pairs included at the nth shift equals X)i=i 



It is obvious that 



A 



(n) 



< 



A 



A n) 



for any i. At the nth shift, 



A 



in) 



A 



(0 



for all 



if a positive detection is declared and 
i, then the modified link vector a[n] at the nth shift carries 
equivalent link information as the link vector a[£] at the £th 



shift. Therefore, we use the model in Theorem|2]for our design 
and re-state the goal of link acquisition as 

1) locating the MLR shift 4; 

2) identifying the set of active users 1 indicated by the 
delay-Doppler set A\ ^ 0; 

3) resolving the delay-Doppler pairs in the ^th window 



A 



C K x Q for % G I. 



For better representation and comparison of the individual 
support set A[ in relation to the original support set A\ , we 
introduce the full user-delay-Doppler sets for the link vectors 
at[n] and a\i] respectively 

An^ {(i,k,q) : (k,q) 6^ n) ,i6l}, (34) 

^ ={(*,*,«): (k,q)eA ( f\iel}. (35) 

In the following sections, we express the link vector explicitly 
with respect to the full user-delay-Doppler set A„ and 
combine the phase rotation matrix T[n] at the nth shift as 



(3 An = T[n]a[n]. 



(36) 



We call = [• • ■ , fii^.q, • • • ] the modified link vector and 
note that it is also a |I|i?-sparse vector. 

V. Compressive Sequential Link Acquisition with 
Sparsity Regularization 

We now develop an SR-LRT detection algorithm that tackles 
the link acquisition problem exploiting the compressive obser- 
vation model given in Theorem [2] Link acquisition attempts to 
discriminate the true pattern A n against all possible patterns 
S n 7^ A n at every shift t = nD as a hypothesis testing 
problem 



H Sn ■ c[n} = BM/3 S +v[n] 



(37) 



over all possible S n at every shift t = nD. Note that the 
signal presence detection is also incorporated in this test by 
choosing S n = to be the null hypothesis. Given a specific 
set S n for each possible (3 S , the amplitudes of (3 S and 
the noise variance a 2 are unknown and treated as nuisance 
parameters. The link acquisition is thus to detect the full user- 
delay-Doppler set S n for all possible Hs n with f3 Sn and 
the noise level a 2 being nuisance in each shift n. Following 
the GLRT rationale, the test consists in finding the set S n 
maximizing 



'(Ms n \P Sn ,o* 
1 

•Tptp | 



\c[n] - BM/3 £ 



i2 

In: 



exp 



(38) 
(39) 



in the presence of unknown parameters j3 s and a 2 . 

Note that when B = I is identity, the samples c[n] are 
equivalent to the outputs of the MF approach. This implies 
that the samples c[n] obtained in the CMS architecture, using 
only P sampling kernels in ( f26] ), are equivalent to a linearly 
compressed version of the exhaustive MF output, even though 
they are obtained directly from the A/D architecture instead 
of using an exhaustive MF filterbank followed by a linear 
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compressor B as in p4| , fTB") , which would be much more 
complex. On the other hand, the difference in post-processing 
between the MF and CMS architecture is that MF allows 
to simply pick the hypothesis corresponding to the largest 
magnitude in the output c[n] as the detection result, while 
using the compressive samples c[n] one necessarily needs a 
more sophisticated detection scheme. 

A. Sequential Estimation for Link Acquisition 

The GLRT involved in the link acquisition requires estimat- 
ing (3 S and a 2 for every possible S n at every shift t = nD. 
For every hypothesis S n 7^ 0, the estimate of (3 Sn under 
colored Gaussian noise v[n] with covariance R„„ = cr 2 R^ 
is then obtained as 



j3 s = arg min 



era 



BM/3< 



I 2 

Ir." 



(40) 



The "hat" notation (•) on the vector f3 Sn refers to the estimates 
of the amplitudes on the support S n . The total number of 
such estimates scales with the number of hypothesis which 
in this case is 2 / '' c ll s l, resulting in an NP-hard combinatorial 
estimation problem. Instead, we obtain the estimates using a 
sparse approach similar to (TJ, (2| for the GLRT. Specifically, 
we solve the combinatorial problem in a "soft" fashion at every 
shift t = nD similar to [2] 

P 4 arg min ||c[n] - BM/3|&-i + A • f(fi), (41) 

where A is some regularization parameter and f((3) = ||/3j| 
or f{(3) = 11/31^ are the sparsity regularization constraint. If 
the || • || o constraint is imposed, the problem is approximately 
solved via greedy methods such as orthogonal matching pur- 
suit (OMP) p8| . When || • ||i norm is used, this problem can 
be solved via convex programs [29]. Generally speaking, as 
discussed in p) and |29|, the required number of samples 
P for sparse recovery in the noiseless case scales logarith- 
mically with the dimension of the length-/|/C| | Q\ link vector 
P oc |Z|i?log/|/C||<2|, which increases if discretizations are 
made finer. 

From the solution of ( |4T| i, we extract the full user-delay- 
Doppler set A n from the soft estimate (3, which in turns gives 
the estimated user set X and the estimated delay -Doppler set 
A^ for each user i el 



£ 



X and 



(42) 



Here £(■) is the extraction mapping from the soft estimate to 
the estimated user-delay-Doppler set. The extraction method 



is explained in Section V-B 



With the estimated set of active users X and individual 
delay-Doppler set .4.1 , we have the truncated estimate of 
the link vector /3_j and the estimated noise variance 



c[n] - BM/3 



R 



(43) 



Since the formulation in ( |4"Tj ) is no longer maximum likeli- 
hood due to the sparsity regularization, we call it the Sparsity- 
Regularized Likelihood Ratio Test (SR-LRT). 









i : max 









B. User-Delay-Doppler Set Extraction £ (j3j 

Given the soft estimate (3 in ( pT| at every shift t = nD, the 
estimated user-delay-Doppler set A n is extracted depending 
on the application scenarios below. 

1) Unknown, random number of active users X: In random 
access communications the receiver has no knowledge of who 
is active, nor any expectation on the number of components it 
is likely to detect. Using this soft estimate (3, we identify the 
active users as 

' 2 > Pl ,keJC,qe Qj (44) 

where pi is a chosen threshold for that specific user to be 
considered present, usually set as a fraction of the magnitude 
of the amplitudes in (3. Then for each detected active user 
i E I, we take R strongest paths in f3i h,q with respect to 
k € JC and Q to be the active set A] for each user i £ X. 

2) Partial knowledge on active users X: This scenario cor- 
responds to environments where all users are active, however 
only a certain subset is likely to be detectable by the receiver. 
GPS receivers are an example. Specifically, there are a total 
of I = 24 quasi-stationary GPS satellites moving around the 
earth and the active satellites in the field-of-view of a specific 
GPS receiver are unknown. However, the GPS receiver is 
informed that at any point in space there should be \X\ = 4 
strongest signals from satellites, and it attempts to find such 
signals, along with their delay-Doppler parameters for trian- 
gularization. In this case a positive detection corresponds to 
having at least four components detected and we can interpret 
this case as fixing \X\ for the receiver detection. In general, 
we identify the users i £ X as those with the \X\ strongest 
amplitudes |/3j,fc,g| with respect to i = 1, • • • , I in (3. Then we 
take R strongest paths in |/3j,fc, g | with respect to k and q to 
be the active set A^ for each user i EX. 

3) Known active users I: This scenario includes multi- 
antenna and cooperative transmission systems, where the re- 
ceiver is aware of the active sources, i.e., X is known. This 
case is trivial because we do not need to identify the active 
users. The active set A^ for each user i € X is formed by 
picking the R strongest components in \j3i k q\ with respect to 
k G JC and Q. 



C. Sequential Detection for Link Acquisition 

Substituting (3^ and a 2 ~ back to ( |3~8] >, the generalized 
likelihood ratio can be computed as 



Vc-SA 



(«) = 





c[ra] - BM/3 j 



2P 



> Vo, 



(45) 



(46) 



which indicates the presence of the signal if r/ c _ SA (n) > r]Q > 
1 so that the receiver knows that certain signal components 
are captured in the observation. Denote the first window that 
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passes the above test as N n = mini axg ^(n) > r) \. As the nuisance amplitudes with f [3 s P([3 s )d{3 s = and 
, . rr?1 , , . , I *? , ,, J , / lft,fc,«l P(Pi,k,q)d0i,k,q = constant, the average KL dis- 



mentioned in ( |15) , the MLR window is located as the window 
that maximizes the likelihood ratio 

4 = argmax r] c . SA (n), n = N„, ■ ■ ■ ,N„+N . (47) 

n 

Accordingly, from the link vector /3_j in the £*fh window, 
we can extract the delay-Doppler pairs 

n,r = Qi,r At, u3i, r = fej^Aw, (k,q) £ A+*\ i £ X. (48) 

VI. Optimization of Compressive Samplers 

The link acquisition performance depends on the ability 
of the SR-LRT to differentiate between different hypotheses 
Hs n - In this section, we seek a criterion to optimize the 
sampling kernels {ipp(t)}p=i by designing the matrix B. The 
metric we maximize is the weighted average of the Kullback- 
Leibler (KL) distances between any Hs n in ( |37j ). Since every 
possible pattern for S n is independent of n, here we omit 
the subscript for convenience. In choosing the KL distance 
we are motivated by the Chernoff-Stein's lemma p0[ , whose 
statement indicates that the probability of confusing Hs and 
Hs 1 decreases exponentially with the pair-wise KL distance 
between them. As we point out later in this section, if the 
noise is Gaussian and the weights are chosen appropriately, 
then the weighted average KL distance of all the pair-wise KL 
distances has the same expression as the Chernoff information 
under the Bayesian detection framework, implying that the 
average KL distance is an effective measure in evaluating de- 
tection performance. Being consistent with our system model 
and detection formulation, we proceed with our analysis using 
the average KL distance with some pre-defined weights. 

Introducing Q (B) = M ff B ff (BMB fl ) ' BM, the pair- 
wise KL distance between any Hs and Hs> is given by 1 3 1 1 



(/3 s -f3 s ,) H G(B)(f3 s -f3 s ,) 



>(Hs\\H S ' 



(49) 



If the pair-wise KL distance is zero, then the two hypotheses 
Hs and Hs 1 are indistinguishable for that particular pairs of 
S and S' . A non-zero pair- wise KL distance between arbitrary 
pair of S and S' with |<S|, |<S'| < s requires spark [Q (B)] > 
2s, where spark[-] is the kruskal rank of a matrix. We note 
that the average KL distance metric we are about to define, 
does not automatically ensure that D (Hs\\Hs') > for all 
S^S'. 

To define the average KL distance, we associate each 
distinct pair of supports S and S 1 with the weight 75,5'. 
Furthermore, we associate the nuisance amplitudes in (3 S a 
multidimensional continuous weighting function P((3 S ) for 
any S. Under these assumptions, the weighted average of all 
pair-wise KL distances is defined as 



js,s' 

5 S'^S 



P(f3 s )P(f3 s ,p (Hs\\Hs>)d(3 s d(3 s , 



Proposition 1. Given a set of normalized constant weights 
75,5' f or every distinct pair S, <S', and a continuous 
weighting function P(f3 s ) = Yl^ ki ^ eS P (Pi,k,q) over 



tance 



is equal to 
D = 4rTr 



M ff B H (BMB ff ) 



(51) 



Proof: See Appendix [C] ■ 
The way we choose the weights 75 s' and weighting 
functions P{(3 S ) is equivalent to assuming a uniform distri- 
bution on the users, delays and Dopplers together with i.i.d. 
Gaussian priors on the amplitudes in j3 s in a Bayesian detec- 
tion framework. According to pT[ , the average KL distance 
obtained in ( |5T] i has the same expression as the Chemoff 
information, which determines the Bayesian detection error 
exponent. Therefore, the average KL distance measure in a 
sense maximizes the error exponent in the exponential decay 
on the Bayesian detection error performance (or miss detection 
performance under the Neyman-Pearson detection framework). 

We note that it is possible that specific choices of the 
number of samplers P and the dictionary {4>i,k,q{t)}i=x 1 
(i.e., the Gram matrix M) lead to indistinguishable sparsity 
patterns (29) such that spark (<? (B)) < \S\ + \S'\. In other 
words, the design of B cannot cure intrinsic problems caused 
by the choice of P or the Gram matrix M, which are given 
parameters in the optimization. The intrinsic problems caused 
by the Gram matrix M in communications is typically handled 
by optimizing the transmit sequences (f>i(t) irrespective of the 
receiver choice such that M becomes diagonally dominated. 
This corresponds to having a well localized ambiguity function 
for each of the <f>i(t) and low cross-correlation between </>j(t)'s 
with different delays and Doppler. Gold sequences used in 
GPS and M-sequences used in spread spectrum communica- 
tions, for example, are known to have good properties in this 
regard. This is a well investigated problem [32| that we do not 
aim to cover in this paper. 

Given P and M, we propose an optimal B that maximizes 
the average KL distance D if there is a unique solution to 
the optimization; when there are multiple solutions that yield 
identical average KL distance D, we further choose in the 
feasible set the matrix B that gives the least occurrence of 
events D (Hs\\Hs>) — 0. We use the results in the following 
lemma for our optimization. 

Lemma 1. (Ratio Trace Maximization |33|) Given two L x 
L positive semi-definite matrices S and G, and an arbitrary 
L x P full column rank matrix W, the ratio trace problem is 
formulated as 



W opt 



are: max Tr 
w 



W fl SW) W"GW 



H. 



(52) 



The optimal W opt = [w° pt ,--- ,Wp'] is given by the 
generalized eigenvectors w° pt , p = 1, • • • , P corresponding 
to P largest generalized eigenvalues of the pair (S, G) with 
P < rank[S]. 



(50) The optimal B is identified in the following theorem^] 



3 Note that the computation of the weights is done offline, and does not add 
complexity to the online processing. 
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Theorem 3. Let M = USU ff , where S = 
diag[(Ti, • • • , cjiKllfil] i s the eigenvalue matrix in descending 
order and U is the eigenvector matrix of M. Denote the set 
of P < rank[S] principal eigenvectors 



schemes. The first alternative is the D-SA scheme discussed in 



U 



{u P = [ui,-..,u P ]: (53) 
M = U£U H U= [ Ul ,--- ,Uj|x:||Q|] }• (54) 

Let Hp be an arbitrary non-singular P x P matrix. When 
the eigenvectors are unique, the matrix B* = EpUp is 
chosen uniquely to maximize the average KL distance D 
in ( |50) . When the eigenvectors are not unique, we choose 
Up = max spark (Up ) to maximize the average KL dis- 

tance and minimize the occurrence of events D (HsWHs') = 
for S ± S>. 

Proof: See Appendix [D] ■ 
The theorem above suggests that given B*, the sampling 
kernels can be designed as 

i=i keK qeQ 

(55) 

Note that, as long as the preamble sequences do not change, 
the optimal matrix B* and the corresponding sampling kernels 
ijjp(t) are pre-computed only once and their design does not 
contribute to the running cost of the receiver operations. If 
the projections on the sampling kernels are implemented in 
the digital domain instead of being analog filters, then the 
samples of ip p (t) are placed in the static memory that contains 
the receiver signal processing algorithms. 

If the principal eigenvectors Up are unique, then the choice 
of B t in general spreads out the pairwise KL distances, but 
D (Hs\\Hs>) = is possible for some choice of S and S'. 
If spark(M) > 2\1\R and we choose P = rank[S], then 
it can be ensured that D is maximized and D (HsWHs') > 
is guaranteed for S ^ S' with |«S|, \S'\ < \T\R. On the 
other hand, an extreme example where the eigenvectors are 
not unique is when {4>i,k,q(t)}\=x'.^?' form an orthogonal 
basis such that M = I. In this case, Theorem 3 is analogous 
to the fundamental criterion in compressed sensing that aims 
to find a matrix with spark(Up) > 2|Z|i? that guarantees the 
recovery of any |I|i?-sparse vectors. 

Remark: The number \U\ of possible eigen-decompositions 
of the given matrix M may be quite large. For the extreme case 
when M = I, an arbitrary unitary matrix will be a possible 
choice. Fortunately, it is well known in compressive sensing 
that partial unitary matrices (such as the partial DFT matrix) 
have good compressive sensing properties (mutual coherence), 
thus this would not entail much loss if the matrix does not have 
exactly the maximum spark. On the other hand, as long as the 
number \U\ is small, a finite search is also possible. More 
importantly, this task only needs to be done once and off-line. 

VII. Numerical Results 

In this section, we compare the C-SA acquisition scheme 
using the CMS architecture we propose against the alternative 



Section III-C which processes the uncompressed Nyquist-rate 
samples c DS [n] using sparsity recovery methods. Another alter- 
native is the MF receiver, which also processes uncompressed 
samples. Rather than exploiting the underlying sparsity of the 
signal, it uses a filterbank matching the signal with all possible 
templates considered as hypotheses, trying to identify the link 
parameters through the best (highest) match. 

To benchmark our C-SA against the D-SA and MF schemes 
we simulate the link acquisition of a single receiver plugged 
in a network populated by I = 10 users, out of which 
\X\ = 4 are randomly chosen to be actively transmitting. The 
user signature codes {ai[m]} in ([TJ belong to a set of M- 
sequences J34) , which are quasi-orthogonal BPSK sequences 
of length M — 255 with unit power (||ai[w]|| 2 = 1). Due 
to the user dislocation, mobility and possible scattering, each 
path of this asynchronous multi-user channel is characterized 
by the triplet {h iiT , t i>r , ov}iez,re{i,...,ii} where {h i>r } are 
Rayleigh distributed, hi, r ~ CAf (0,1/ (R\I\)), and uncorre- 
cted, E{h hr h*, r ,} = l/(\l\R)5[i - i']S[r - r'}, normalized 
fading coefficients. The random delays {ii >r } are the sum 
of: (i) a time of arrival to = min t j r that is uniformly 

i,r ' 

distributed over an interval that spans the duration of the 
preamble to 6 14(0, MT), and of (ii) multipath delays that 
are uniformly distributed within an interval (to, to + t max ) 

ti> r >\ is the maximum multipath 



where t 



max : . ITlilX^ 



> 



delay spread of the channel. Consequently, all the arrival 
times are within a window of duration 2MT + t max . The 
random frequency offsets {u)i jT } are uniformly distributed, 
uJi yr € U(— w m ax) w max ), over a range delimited at each 
side by the maximum Doppler spread cJ max . As we simulate 
underspread channel conditions, we choose cJ max such that 
Wmaxtmax *C 2tt. Thus, for a multipath delay spread of 
tmax = 4T the choice of w max = 2.5 ■ 10~ 3 x 2it/T is 
comparable to a 25 kHz offset for a 1 MHz signal. 

More specifically, we compare the C-SA with the D-SA and 
the MF at the same resolutions for both Doppler and delays, 
which are, respectively, Alj = cj max /5 = 0.5 x 2ir/T and 
At = T/2. We test the C-SA using three different numbers 
of sampling channels P = (60,80, 100). On the other hand, 
the D-SA scheme uses Nyquist-rate samples per shift, which 
corresponds to using the whole spreading code duration of 255 
samples per symbol, while the MF scheme uses a I\K\ = 110- 
channel filterbank and performs I|/C||Q| = 2640 projections 
per shift. In fact, in the simulation, the sequential processing is 
performed by generating N — D/T s new Nyquist samples for 
each shift t = nD with D = 10T, updating an internal buffer 
of size W < ((M + 2L g )T + r max ) /T s , where L g T = 3 is 
the duration of the side lobe of g(t) in ([TJ in samples. For 
a channel with a multipath delay spread of £ max = 4T and 
a shift of size D = 10T, the delay search space Q accounts 
for a potential displacement of r max > £ max + D = 14T, 
therefore r max / At < Q = 28. For a frequency shift that spans 
the range (—uJ max ,CJ max ) the size of the discretized Doppler 
space is K = 2(cj max /Acj) + 1 = 11. This value, together 
with the number of discretized delays, lead to a multi-user 
time-frequency grid of 7|/C||Q| = 2640 elements, common 
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for both C-SA and the alternative MF and D-SA schemes. 

In the simulations, the C-SA compressive samples are gen- 
erated from the same Nyquist-rate samples used for the other 
receivers, by projecting them onto the digitized version of the 
sampling kernels ip p (nT s ), p — 1, . . . , P, n = 0, . . . ,M — 1. 
The C-SA simulation recovers the link parameters by solving 
( |4Tj > with the OMP algorithm |28 1, which is a popular choice to 
approximate the solution of a sparse problem p5|. To motivate 



ROC: C-SA (unaware) 



ROC: D-SA (unaware) 



ROC: MF (unaware) 



our selection of OMP, we refer to Section VIII for an empirical 
evaluation of the OMP against two well-known l\ minimizers, 
SpaRSA (36) and ^i-Homotopy (37), (38). 

A. Signal Detection Performance 

The first test on the detection performance of the C-SA 
receiver against the MF and the D-SA receiver we show, is for 
the cases of completely unknown active user sets, as discussed 
in Section | V-B 1 1 In Fig. [2] all receivers are unaware of the 
random set X of active users. Specifically, receivers consider 
as active components those that are found to have a signal 
strength that is at least 30% of the strongest components 
they estimate, i.e. in |44} pi = max/^ \fii t k,q\ 2 /3. If no 
possible component meets this requirement, the channel is 
declared idle. Events for the winning hypothesis t are 
generated according to ( [48) . To first compare the sensitivity 
of the different receivers to active components, we define 
a signal hypothesis Hi corresponding to the all the non- 
idle channel hypotheses, i.e. Ae* ^ 0. Then, the detection 
sensitivity is measured in terms of the receiver operating 
characteristic (ROC) curve, tracing the probability of detection 
Pd(m) = P{Vcsa(L) > VolUi), against the probability of 
false alarm Pf(r]o) = P(t]c-sa(£*) > ^ol^o) when the channel 
is actually idle. Note that a positive detection may correspond 
to an incorrect identification of the specific users that are 
active. Thus, Section [VII-B| shows the rate of correct detection 
of active components for the same simulation scenario. 

As it can be observed, although the C-SA receiver exploits 
less than P/M = 80/255 « 1/3 of the Nyquist-rate samples, 
the results from Fig. [2] show a modest degradation of the 
ROC compared to the MF receiver (less than 0.1 measured 
at Pf{j]o) = 0.1 and SNR = -8 dB). As expected, since the 
D-SA can leverage the additional observations to enhance its 
sensitivity, a growing gap P d (r]^ SA ) - Pd(r? c - SA ) is observable 
as the SNR increases (measured at Pf(i]o) = 0.1) between the 
SNR = -6 dB and the SNR = -12 dB curves. 

B. User Identification and Parameter Estimation Performance 



In the simulations shown in Fig. 3(a) and 3(b) 



we measure 



the detection performance of I by the rate of successful 
identification P(T = I). In these simulations, the threshold 
770 is set to a level such that P/(r?o) < 0.1, and thus we trace 
the curve at the point Pf(r)o) = 0.1 on the figures. The two 
sets of figures correspond to, respectively, the case where a 
receiver has partial knowledge of the active user components 
(as in for example the GPS receiver discussed in Section [V-B2[ ) 
and the exact same case examined previously in Fig. [2] where 
the receivers are unaware of the user X, In the second case, 
for successful detection, not only the elements of the sets have 




Fig. 2. Comparison of ROC curves for the order unaware receiver using 
the C-SA scheme with P = 80 channels (left), the D-SA scheme processing 
M uncompressed Nyquist observations (middle), and the MF scheme (right); 
tested at SNR={-12, -10, -8, -6} dB 



to be consistent TCI but also their cardinality needs to be 
identical |I| = Instead for the first case, if the receiver 
has partial knowledge of the active components, then what 
matters is that the components are correctly identified, but 
their number is known ahead of time. 

With a relatively short training sequence, we can see in 



Fig. 3(a) that the C-SA in the first case identifies the active 



user set with large probability (0.96 at SNR = 20 dB). The 
MF has worse performance due to the multi-user interference 
and to the presence of unresolvable paths. In fact, the MF 
receiver is unable to isolate the multi-path arrivals that fall 
within the same symbol period and, due to the presence of 
different Dopplers, its side-lobes may contribute negatively to 
the correlation, masking other active components. In contrast, 
the OMP algorithm in the C-SA scheme cancels the con- 
tributions from paths detected in previous iterations, before 
updating the projections to search for other components (the 



OMP processing steps are summarized in Section VIII 1. It is 



evident, however, that a low SNR, the C-SA scheme suffers 
from a loss due to the compression (—1 dB at the rate 0.6 
with P = 100). This is clearly understood by observing 
the performance of the D-SA receiver as well. By processing 
uncompressed samples with the sparse reconstruction method, 
the D-SA curve combines the best of both worlds and, thus, 
its performance bounds the user identification rate for both the 
MF and the C-SA receivers in both examples. As shown in 



Fig. 3(b) the performance degrades when the receivers do not 



have side information on \I\ (a difference of —0.13 for the 
CMS receiver against -0.3 of the MF at SNR = 20 dB). This 
is due to the cardinality mismatch, {\I\ ^ that occurs 
while estimating the order. 

The accuracy of the recovered set At* is evaluated by the 
root mean square error (RMSE) of the parameters Ti <r and 
LOi :T that are associated to the correctly identified users I = X. 



12 



User Identification Rate (order aware) 




Delay RMSE Performance 



SNR 

(a) Partial User Knowledge 

User Identification Rate (order unaware) 



C-SA (P=60) 
■C-SA(P=80) 
■C-SA(P=100) 
■D-SA 
■MF 




SNR 

(b) No User Knowledge 



Fig. 3. (a) Successful user identification rate P(X = X) for the order 
aware receiver implementing the MF (blue) receiver, the D-SA receiver (red), 
the CMS with C-SA (grey shades) receiver with P = {60,80,100}. (b) 
Successful user identification rate P(X = X) for the order unaware receiver 
implementing the MF (blue) receiver, the D-SA receiver (red), or the CMS 
(grey shades) receiver with P = {60,80, 100} 



Thus, 

RMSE(t) 4 

RMSE(w) = 
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are the average RMSE of the delay parameters the Doppler 
frequencies respectively. 

To verify the accuracy of the parameter estimates of n t r 
and Lo i r , we trace the RMSE's of the order-aware case. Once 



again we observe, from Fig. 4(a) and Fig. 4(b) that the 
performance of RMSE(r) and RMSE(u;) is enhanced by the 
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SNR 

(a) RMSE(t) 

Doppler RMSE Performance 



C-SA (P=60) 

C-SA(P=100) 

D-SA 

MF 




SNR 

(b) RMSE(w) 



Fig. 4. (a) RMSE(t) as a function of SNR = {-12, -8, . . . , 20} 
implemented by the C-SA scheme (grey shades) with P = {60, 100}, 
by the D-SA (red) or by the MF (blue), (b) RMSE(uj) as a function of 
SNR = { — 12, —8, . . . , 20} implemented by the C-SA scheme (grey shades) 
with P = {60, 100}, by the D-SA (red) or by the MF (blue). 



detector that better leverages the presence of the multi-path. 
In fact, at SNR = 20 dB, the accuracy of the CMS, with 
P = 100, and the D-SA approach the grid resolution, i.e. 
RMSE(t) w At and RMSE(u;) w Auj. Oppositely, the 
contribution of the unresolvable paths, in either frequency 
or time, to the correlation adversely affect the parameter 
selection. Not canceling the previously selected components 
contribute to a large error after the selection of the dominant 
paths as the same arrivals are likely to be selected more 
than once by the presence of correlated components. These 
errors impact the highest resolution since at SNR = 20 dB: 
RMSE(t) > 2Ar and RMSE(u;) > 2Acj. Instead, at low 
SNR, the performance is bounded by the maximum error given 
by the search space which is a function of w max and r max , 
respectively, due to the early detection resulting from heavy 
noise. 
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ROC 



Source Identification Rate 





Average SER (order unaware) 



P,(n) 



SNR 



(a) ROC and Identification Rate 
Delay RMSE Doppler RMSE 




SNR 



SNR 



(b) Parameter RMSE 



Fig. 5. (a) ROC curve at SNR=-8 dB and user identification rate P(X = X) 
of the C-SA scheme, against the MF receiver and different choices of B's. (b) 
Delay and Doppler estimation RMSE of the C-SA scheme, against different 
random designs of B and the MF. 

C. Optimality of Compressive Samplers 

In this subsection, we briefly compare the performances of 
the C-SA scheme using a P = 100-channel CMS architecture 
with the optimal samplers versus other random projection 
schemes in compressed sensing. The ROC curv e and the 



user identification rate P(X — I) in Fig. 5(a) show that 
the optimal sampling kernel, denoted by C-SA-KL, exhibits 
better performance than random designs of B using matrices 
whose entries are Gaussian (C-SA-G), Bernoulli (C-SA-B), or 
randomly selected rows of a DFT matrix (C-SA-F). It can also 
be observed from Fig. |5(b)| that the RMSE of the delay and 
Doppler estimates are also improved. 

D. Data Detection Performance after Link Acquisition 

As a complementary evaluation of the proposed C-SA 
acquisition scheme, we provide here also the data detection 
performance in terms of Symbol Error Rate (SER), using 




SNR 



Fig. 6. SER performance of the CMS receiver with C-SA acquisition scheme 
against the D-SA and MF acquisition schemes. 



linear minimum mean-square-error (LMMSE) multi-user re- 
ceivers constructed by using the estimated signal spaces, and 
using the link parameters acquired by the C-SA, D-SA and 
MF link acquisition schemes respectively. For simplicity, we 
consider the scenario where the receivers have no knowledge 
of the active user components initially, as discussed in Section 



V-Bl We remark here that the SER performances of all the 
receivers are influenced consistently by the pre-defined grid 
resolution At and Aw, therefore the error floors of parameter 
estimation observed in Fig. |4(a)| and |4(b)| ultimately result 
in an irreducible error floor on the SER curves for all the 
receivers. Thus, given a specific resolution, it can be seen in 
Fig. [6] that the parameters obtained by the C-SA scheme lead 
to a much better detection performance than that of the MF 
scheme, while closely approaching the performance of the D- 
SA scheme. If necessary, the resolution can be made finer such 
that the error floor decays to a satisfactory level according to 
the receiver requirement for each case, and other decoding 
strategies as well as decision directed and blind approaches 
can be combined to further improve performance. Once the 
signal subspace is coarsely identified, fine tuning and tracking 
the signal space and decoding symbols optimally becomes a 
rather standard problem. Hence this example is just aimed at 
illustrating the quality of the initialization to be expected from 
the architectures examined. 



VIII. Cost Analysis 

In this section, we explicitly analyze the implementation 
costs of the MF and the proposed C-SA architectures in terms 
of storage requirement and computational complexity. The cost 
benefits are illustrated in two regimes respectively: the analog 
implementation, that corresponds to what the paper describes 
mathematically in detail, and a digital implementation, which 
would be necessary if the projections on the compressive 
samplers ip p (t) cannot be implemented in the form of analog 
filters. The compressive procedure in that case would emulate 
our implementation in simulations, where Nyquist samples of 
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the signal x(f) are projected onto digital filters matched to the 
samples of ip p (t). 

The metric to evaluate storage requirements is given by 
the A/D hardware cost, measured as the number of sampling 
channels P which is the buffer size of the A/D samples, while 
the computational complexity is evaluated counting the total 
amount of complex additions and multiplications, and by the 
average CPU run time spent on executing tasks (a 64-bit i7 
920 CPU running at 2.67 GHz). In the following, we first 
settle on the sparse recovery solver for the SR-LRT in the C- 
SA scheme, and then proceed with our comparison using the 
chosen solver. 

A. Sparse Recovery Solver for the CMS Receiver: The OMP 
Algorithm 

The C-SA receiver spends its greatest effort in solving the 
optimization HTJ. Fast l\ minimizers like SpaRSA |36| or 
^i-Homotopy [37], |38| are often the methods of choice. The 



Algorithm 1 C-SA Scheme 



former greatly reduces the complexity by approximating the 
Hessian of the gradient descent by a diagonal matrix, whereas 
the latter inverts a system of equations whose number of 
unknowns, at each iteration, remains restricted to the non-zero 
elements of the sparse vector estimate. Greedy algorithms like 
the OMP 1 28 1, are efficient approximations to the solution of 
sparse problems as well |35|. The OMP algorithm iteratively 
detects the strongest element in the sparse vector and removes 
its contribution in the next iteration; thus, the number of itera- 
tions required by OMP is bounded by the maximum possible 
components that exist in the signal, which in our case is \I\R. 
The average CPU run time spent in solving ( |4"Tj ) with different 
solvers is illustrated in Fig. 7(a) traced against P, where the 
OMP algorithm shows significantly less computation time. 
Thus, in Fig. |7(b)| we further compare the average CPU run 
time of the CMS receiver using OMP against the MF receiver, 
the implementation details of which will be discussed in the 
following subsection. OMP has smaller complexity primarily 
because it stops as soon as all the strong entries have been 
detected. In contrast, ^i-Homotopy and SpaRSA do not limit 
the search to a single set, but rather explore the feasible set 
by selecting and de-selecting elements of the support vector 
(£i-Homotopy), or by shrinking it through a gradient descent 
(SpaRSA), until a desired convergence criterion has been met. 

B. C-SA Scheme vs. MF Scheme 

Using the OMP algorithm for sparse recovery, we summa- 
rize the steps of the C-SA and MF schemes in Algorithms [T] 
and [2] respectively. 

Based on the algorithm descriptions, we provided the order 
of storage cost and computational complexities in the tables 
below. Storage accounts for a data path storage component, 
dynamically updated with the streaming data that correspond 
to new observations to be processed, and for a static compo- 
nent, that stores filters or sampling kernels parameters needed 
to perform signal processing on the data. 

It is seen in Fig. [8] and [9] that both our C-SA scheme 
with the CMS receiver and the MF receiver (using exhaustive 
matched filtering) have computational complexities that scale 



(CMS.l) obtain compressive samples c[n] at the ?ith shift; 
(CMS.2) initialize (3° = 0, A a n = 0, S = BM, S = £ 
j = 1 and run the OMP algorithm; 



(OMP.l) remove interference 8° = c[n] — Sj_i 



13- 



j'-i 



(OMP.2) projection £> = Sj^S* , £ j - [■ • • • • - ] T ; 

(OMP.3) detection A J n = A^ 1 U {{i, k, q)} with (i, k, q) = 

argmaxi,^ |^ M | 2 ; 

(OMP.4) update = BM and S, = BM 

L J(:,X) L 

(OMP.5) update the link vector 



Sjc[n] 



0: 



(56) 
(57) 



(OMP.6) stop if either j = \2\R or ||c[n] - BM/3 J || < e, 
and set j = j + 1. 
(CMS.3) Evaluate the likelihood ratio rj c _ SA (n) in ( p3] l and 

check if it exceeds t/q. 
(CMS.4) If yes, then extract components accordingly (order- 
aware, order-unaware). 

Algorithm 2 MF Scheme 

(MF.l) obtain the sample array C MF [n] in ( fT7| from the MF 
filterbank; 

(MF.2) identify the maximum output and check if it exceeds 

Pi for all i = 1, • • ■ , /; 
(MF.3) If yes, then extract the delay-Doppler set for each 

active user as in Section IflTl 



linearly with the dimension of the search space J|/C||Q|. 
However, the data path storage of the C-SA receivers is greatly 
reduced. Another storage gain is found in the case of digital 
implementation, because there are fewer projections to be 
made and thus, in principle, unless the MF are synthesized 
on the fly, a smaller amount of static memory is required to 
store the samples of ip p (t). 

When the architectures are implemented in the digital 
domain, the C-SA receiver also leads to a great reduction in 
computational complexity, with an approximate ratio with the 
MF receiver complexity of 

MP + I\K\\Q\ + P 3 P P 3 



(58) 



MI\K\\Q\ I\K\\Q\ MI\K\\Q[ 

Clearly, when the preamble sequence is long and the search 
space is large M/|/C||Q| ^ P 3 , this ratio becomes less than 
1 and the C-SA architecture leads to computational savings 
while, as seen in our simulations, maintaining comparable 
performance. This is also why the C-SA receiver implemented 



in the simulation (see Fig. 7(b) i considerably outperforms the 



MF receiver in terms of average CPU run time for large 
M > 3000 with P = 60,80,100. On the other hand, 
when M is small (e.g. M = 255 in Section |VII), the MF 



receiver has less computation time for M < 3000 against 
P = 60, 80, 100 as in Fig. 7(b) but such a short preamble 



15 



Average CPU Runtime 



Average CPU Runtime (MF vs OMP) 




E 
i- 

Z> 
Q. 

o 



OMP (P=60) 




— -OMP (P=80) 




-•-OMP (P=100) 


X 


- -»-MF 






*r ^ 















3000 

M 



(a) run time v.s. the number of sampling channels P 



(b) run time v.s. the length of the preamble sequence M 



Fig. 7. (a) Average CPU run time for the CMS receiver, implemented with different solvers (OMP, SpaRSA, i\ -Homotopy), as a function of P = 
{10, 20, ■ ■ ■ , 120}. (b) Average CPU runtime of a CMS receiver using P = {60, 80, 100} compressed observations against the MF receiver. The average 
runtime is measured against the preamble length M. 
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Fig. 8. Complexity breakdown for the CMS receivers at every shift t = nD. 
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Fig. 9. Complexity breakdown for the MF receivers at every shift t = nD. 



does not provide sufficient processing gain for reliable link 
acquisition in the presence of multipath, as can be clearly 



seen from the numerical results (e.g., see Fig. 3(a) I, Thus, 
when M is small, the gain of the C-SA receiver also lies 
in the superior acquisition performance demonstrated by the 
numerical results, except for the low SNR region where the 
C-SA is not sufficiently sensitive. 

IX. Conclusions 

In this paper, we proposed the SR-LRT receiver using a 
unified CMS architecture for link acquisition, which we refer 
to as the C-SA scheme. This scheme uses a sequential SR- 
LRT that jointly detects signal presence and recovers the 
active users with their link parameters. We optimized the 
CMS architecture to maximize the average Kullback-Leibler 
distance among the hypotheses tested in the SR-LRT and 
show that, with the optimal compressive samplers we propose, 
the receiver detection outperforms those with conventional 
compressed sensing alternatives. Furthermore, through the 
numerical comparison of the proposed architecture with the 
D-SA scheme and the MF approach, we have shown that 



the C-SA receiver can scale down its processing storage 
and complexity with greater flexibility, while maintaining 
satisfactory performance. 

Appendix A 
Proof of TheoremQ] 

Substituting |23) into ( |20| i, we have 
i 

c >] = E E E c*i,k, q e ikAuW (<t>i, k , q (t - ED), - nD)) 

i=l k£K qeQ 

+ (v(t),i/j p (t-nD)). 
Define the P x I|/C||Q| matrix 



(59) 



M^n-t?] 



p,(i,k,q) 



R^ k , q [(n - t)D] 



(60) 



= (^{t-iD^it-nD)) (61) 



and denote v p [n] = (v(t),i/j p (t — nD)) as the sample 
of filtered noise, whose covariance can be obtained as 

E{v p [n}v* p ,[n}} = a 2 (^„(t), WO)) using E{v(t)v*(s)} = 
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a 2 5(t — s). Therefore, the covariance matrix of the noise 
samples vector v[n] = [vi[n],--- ,^p[n]] T has a covariance 
matrix R vv = ct 2 R^ v , where [K^]p^ = (ip p (t) , ip p < (t)) is 
the Gram matrix of the kernels ip p (t)'s. 

Denote c[n] = [ci[n], ■ ■ ■ ,cp[n]] T as the length-P vector 
containing the samples acquired from the CMS filterbank at 
time t = nD. Given the link vector a[£] = [■ ■ ■ , cti,k, q , • • • ] T 
at the Ah shift as ( |24] >, we then have the observation model 
in matrix form 



The summation £ qeQ ai, M e ifeA ^ D ^, (fc , ? ,^ i3 [(n - i)D] 
can be adjusted with respect to the relative time index [n — £] 
by re-writing the correlation in d66l) as 



c[n] = M^[n - e\T[i]a[i\ + u[n]. 
Using a sampling kernel constructed as 



(62) 



^p(*) = EE E b P,(l,k,q)<t>l,k,q(t), (63) 
i=l k£K qGQ 

the cross-correlation R^fa h . \{n — £)D] in ( |60| ) can be eval- 
uated as 



(64) 



= EE E b p,(i',k', q >) R <f> i >, k >, q »i>i, k J( n -t) D ], (65) 
v=x q 'eQ k'eK 

where R <l> i >, k >, g ><t>i,kJ( n ~ £ ) D ] 

{(j>i k,q(t ~ £D), <fii' t k',q'{t — nD)). With a change of variable 

tf = t — nD — q' At, the correlation can be written as 

R 4>i', k >, «'<(>*.»,« [(" " £ ) D ] = e ifeAw ("-^e- J ' feA '" 9 ' AT (66) 
x <:i' ) [(< ? '-g)Ar+(n-^) J D] 

where t ^ (At) is the ambiguity function 

R { ^ k ' ] {At) = I - A^e-^-^M*. (67) 



From (|64j), [n — £] in (|60j> can be re-written as [n - 
£} =BM^[n-4 where 



M^n ■ 



{%' ,k' ,g' ') ,(i ,fc ,q) 



(68) 



Then the observation model can be re-written as 

c[n]=BM # [n-f]r [£} a[£] + u [n] . (69) 

Finally, the Gram matrix of ^ p (t)'s can be obtained accord- 
ingly as 

R^=BM 00 [O]B ff (70) 
which gives the noise covariance as R m , = o^BM^OJB^. 

Appendix B 
Proof of Theorem|2] 

From |59|, each sample c p [n\ from the CMS architecture 
p = 1, • • • , P can be expressed as 

i i 

i' = lk'£Kq'£Q i=lk<EKq£Q 

x R ^. k . q >^, k , q [(n ~ 0-°] + w pN- 



„ifcAcj£L> 



(71) 



= e ikAu(n-l)D e -jkAuq'Ar R {k-k') y _ ^ + ^ _ ^ 
= e ikA^n-l)D e -jkA^q'Ar R {k-kJ) y _ ^ _ ^ _ ^jy])^] 



e 



iiAu(n-£)D 



R ^>i> ,h' ,q"l > i,k,q-(,n-i)N M ' 



(73) 



Without loss of generality, let D/ At — N G Z. With a change 
of variable q" = q — (n — £)N and substituting the equivalent 
correlation in d72li, we have 



^ ^ *-^i,k,q& 
q£Q 

m 



ikAutD 



R ^, k .,„^uAri-£)D] 



aj,fc,g J R0 i , >t ,„,0 l , fci ,_ ( „_ f)lv [O] 



e a i,M''+(n-^^^<,'^,*,<,»L J> 9 = ?-(» 

<?"eg 

With the re-formulation, ( |7Tj i is re-written as below 



," L' ' - / , ^ b p,(l' ,k' ,<?') / , 

i' = lk'eKq'eQ l=lk£Kq£Q 



c v n 



EE 



EE ei 



&i,k,q-\-(n 



By letting M = M^^fO] and defining the shifted link vector 
at[n] at the nth shift as 



a[n] 



(i,k,q) 



a i,k,q+{n-i)Ni 



(74) 



the observation model can be equivalently re-written as 

c[n] = BMr[n]a[n] + u[n]. (75) 

Appendix C 
Proof of Proposition 1 

The pair-wise KL distance in ( |49| ) can be re-written with 
the trace operator Tr( ) below 

1 

r2 



{H S \\H S >) = ^Tt \m h B h (BMB h ) 1 BMR PstPs , 



where R(3g,f3 s , = (Ps - Ps>) {Ps ~ Ps')" ■ Then the aver- 
age pair-wise KL distance D in ( |50l > becomes 

D = — Tr \M H B H (BMB ff ) _1 BMR 

a 2 i 

where R = ^2 S ^2 S , 75.5' R5 .s' an d R5.5' is the averaged 
covariance matrix of (3 S over the amplitudes 



R 



■5,5' 



J J P(p s )P(p s ,)Rp St p sl df3 s dp s , 



Given P((3 S ) = I\ {lik , q)eS P<fii, k , q ) with / (3 s P(f3 s )d(3 s = 
and / \Pi, k ,q\ 2 P(l3i,k,q)dl3i, k ,q = constant, the averaged 
matrix Rs,S' is diagonal. Furthermore, if the set of weights 
75,5' are constant for all S,S' and the individual weighting 
function P(Pi,k,q) is identical for all i, k, q, it also satisfies 
R oc I because the summation over S,S' is symmetric, and 
hence produces equal sum. Thus the result follows. 
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Appendix D 
Proof of Theorem[3] 
By analogy with Lemma [T] we have S = M and G = 
MM B in (0). Let B = [b 1 ■■■ b P ] H , where b p is a 
length-/ 1 /C 1 1 Q | column vector such that h p = w p . In this 
setting, according to Lemma [T| the optimal b p is chosen as 
the generalized eigenvector of the matrix pair (S, G) such 
that Mb p = ApMM^bp. Using the eigen-decomposition of 
M = USU H and the property U ff U = I, we have 



SU fl b 



A p ££ ff U ff b p , 



P. 



(76) 



If we choose b p = u p , where u p is the pth column in the ma- 
trix U, then the above relationship holds for all p = 1, • • • , P 
as long as P < rank(S) because u^Uj = S[i — j]. This gives 



L.H.S. 
R.H.S. 



<7 p TJ u p <7 p e p , 



A p SS ff U ff u p = A p cr p e p , 



leading to a generalized eigenvalue of A p = l/u p , where a p > 
is the pth eigenvalue in X and e p is the canonical basis 
with 1 in the pth entry and otherwise. Denote by Sp and 
Up the principal eigenvalue and eigenvector matrices. Then 
the optimal B is chosen as B = SpUp, where Hp is an 
arbitrary non-singular P x P matrix. According to ( |5Tj ), this 
choice gives 



=Tr 



v i? n jf n -jj„-i n -i n 

Zjpi — ID — in Zip l_l p l — pZjp 



P =l 



which is independent of Hp. If the principal eigenvectors 
Up are unique, the above B uniquely maximizes the average 
KL distance D. This choice of B in general spreads out the 
individual KL distance, while the occurence of the events 
D {HsWHs') — is analyzed below. So is the case when 
Up is not unique. 

Now we examine the occurence of D (HsWHs') — 0. Let 
AsuS' = (As — As') be a sparse vector with |<S|, |<S'| < s, 
and s < \I\R. Substituting the matrix B = SpUp back to 
( |49| > and simplifying the expression, the individual KL distance 
is 

1 CH S \\H S ,) = ^/3f u5 ,UpE P U«/3 5u5 „ (77) 



"^2 AsuS' ^P* 

VS^S', \S\,\S'\<s. 



(78) 



Note that /3 SuS , is a 2s-sparse vector and H> (Hs\\Hs') lS 
bounded away from zero as long as any 2s-sparse vectors do 
not fall into the null space of the matrix Up. In order to 
minimize the occurrence of the event D (HsWHs 1 ) — 0, it is 
equivalent to maximizing the kruskal rank of the matrix Up 
such that the matrix B can recover any s-sparse vector f3 s 
with s being maximized in this process. 
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